专利摘要:

公开号:SE0901012A1
申请号:SE0901012
申请日:2009-07-20
公开日:2011-01-21
发明作者:Fredric Lindstroem;Christian Schueldt;Ingvar Claesson
申请人:Limes Audio Ab;
IPC主号:
专利说明:

15 20 25 30 over the telephone network and is received by the conference telephone which processes the received B-number and presents it on the loudspeaker on the A-side.
In a scenario such as the one described above, there are two types of echoes. First, in addition to the B speaker's signal, the conference telephone may receive a delayed line echo of the A speaker's speech due to echo originating from the telephone network. Secondly, due to the room acoustics, an acoustic echo will appear on the microphone when speech originating from the B-side is presented on the speaker. Removing these echoes is of utmost importance for listener comfort and system stability (to avoid so-called roundabouts).
Echoes are usually removed by attenuation, extinction or a combination of attenuation and extinction. Solution with attenuation is relatively simple, but comes in situations where both the A-speaker and the B-speaker speak at the same time only let through one party. This is called half duplex. Echo quenching, on the other hand, normally uses one or more adaptive filters to create a model of the echo which is then subtracted from the microphone signal without disturbing the desired speech. This solution allows simultaneous speech from the A and B speakers and is called full duplex. In practice, however, echo cancellation does not always succeed in removing all echoes. Therefore, a combination of echo cancellation and attenuation is usually used (to remove residual echoes that may remain after echo cancellation).
How much residual echo should be attenuated depends on the situation, but can generally be said to be a function of the ratio between speech and echo. A signal that contains strong speech and weak echo should not be attenuated as much as a signal containing weak speech and strong echo, as the strong speech will mask part of the echo. In addition, the speech should preferably come through as unaffected by the attenuation as possible to achieve high listening comfort. Estimating the relationship between speech and echo in a signal is a non-trivial problem.
The problem can also be formulated as distinguishing between doubles and an eco-path change. A double-digit situation arises when both the A and B speakers speak 10 15 20 25 30 at the same time. In a double-digit situation, the signal after echo extinction will consist of a combination of residual echo and speech. In other words, the signal will contain more energy than a signal with only residual echo. An echo path change situation means that the feedback properties This can occur due to changes in the acoustic environment (eg by moving people or objects fl on the A-side) or changes in the telecommunications network (eg when a call is connected). The adaptive echo cancellation filter will then produce a larger residual echo before it has time to adapt to the change. Thus, both double digits and eco-path change will result in increased energy from the echo canceller. In the double-digit situation, the attenuation should be restrictive, while strong attenuation should be used in the eco-road change situation. One problem is thus to distinguish double figures from eco-road change. Another problem is that echo cancellation filters sometimes behave irregularly during and immediately after doubles. This makes it difficult to estimate the actual amount of echo, which can lead to an underestimation of the echo present in these situations. This risk requires an extra margin when estimating the relationship between speech and echo to minimize the risk of echo being perceived as speech. A disadvantage of this extra margin is that it complicates the detection of real speech on the near side.
Significant for the difference between dual and eco-path change, as well as for other applications, is the ability to estimate the stationary noise level and the feedback factor (eco-strength). A common method for achieving noise estimation is based on minimum statistics, as described in e.g. “Acoustic Echo and Noise Control: A Practical Approach” by E. Hänsler and G. Schmidt, Wiley, 2004, and in “A Combined Implementation of Echo Suppression, Noise Reduction and Comfort Noise in a Speaker Phone Application” by C. Schüldt, F. Lindström and I. Claesson, In Proceedings of IEEE International Conference on Consumer Electronics, Las Vegas, NV, January 2007. Estimation of feedback faults can be achieved, for example, by calculating the ratio of the estimated energy in the speaker and microphone signals, or obtained from coefficients in the near-side adaptive echo cancellation filter. Details of how the feedback factor can be estimated can be found in, for example, “Step ~ size control for acoustic echo cancellation filters - 10 15 20 25 30 an overview” by A. Mader, H. Puder, G.U. Schmidt, Signal Processing, vol. 80, no. 9, pages 1697-1719, 2000.
The distinction between double digits and echo path change is also important to prevent divergence of the echo cancellation filters that can occur during double digits.
Thus, the alter update should be stopped below double digits. If only one adaptive filter is used and an echo path change is doubled, the adaptive filter will not be updated, leading to a deadlock situation. One solution to the deadlock problem is the so-called two-way solution where two parallel adaptive echo cancellation filters are used. This solution is explained in more detail in “Echo Cancels With Two Echo Path Models” by K. Ochiai, T. Araseki, and T. Ogihara, IEEE Transactions on Communications, vol. COM-ZS, no. 6, pages 8-11, June 1977. One filter, often referred to as the background filter, is updated continuously (very often) while the other filter, often referred to as the foreground filter, is updated much less frequently.
For this reason, the foreground filter is sometimes called the xa filter. The foreground filter, or fi xa filter, is the filter that produces the output signal used for echo cancellation and updating of the foreground filter is done by copying the frequently updated background filter to the foreground filter when the background filter is considered to perform better in terms of echo cancellation than the foreground filter. This is what happens after an eco-road change. In a double-digit situation, however, the background filter will diverge. However, this does not affect the output of the system as the foreground filter produces the output.
The conventional method described above suffers from disadvantages which, in situations depending on the actual solution, make it difficult to determine the level of attenuation to be applied to the residual echo signal in communication units. There is therefore a need for an alternative solution to control residual code attenuation in communication units.
SUMMARY OF THE INVENTION It is an object of the present invention to provide means for improved control of residual code attenuation. This object is achieved by a unit comprising an adaptive foreground filter configured to calculate a first echo estimation signal based on a first input signal, and an adaptive background filter which is updated faster than the foreground filter and is configured to compute a second echo estimation signal based on said first input signal. The unit further consists of attenuation control means for controlling the attenuation of an echo extinguished output signal. The unit is characterized in that the attenuation control means is controlled to calculate a maximum echo estimation signal using both the first and the second echo estimation signal and to control the attenuation of the echo extinguished output signal based on said maximum echo estimation ring and / or a signal derived from said maximum echo.
The goal is also achieved by a method for attenuation control of an echo-extinguished output signal, containing the steps of: - calculating, by means of a foreground filter, a first echo estimation signal based on a first input signal, and - calculating, by means of an adaptive background filter which is updated faster than said foreground filter, a second echo estimation signal based on said first input signal. The method is characterized by the steps of: - calculating a maximum echo estimation signal using both said first and said second echo estimation signal, and - controlling the attenuation of said echo extinguished output signal based on said maximum echo estimation signal and / or a signal derived from said maximum echo estimation signal.
The object of the invention is also achieved by a computer program for the unit described above. The computer program is characterized in that it consists of computer-readable code which, when run by a processing unit in the unit, causes the unit to perform the method described above.
The above-mentioned unit and method can be used both to control residual code attenuation generated in a communication unit, such as a conference telephone, when one of the microphone of the communication unit picks up a speaker signal which is simultaneously presented by its speakers, and to control residual echo attenuation from a network, such as the telephone network, to which communication devices are connected.
In the first scenario, hereinafter referred to as the acoustic echo scenario, the first input signal is the loudspeaker signal received by the communication unit and which is intended to be transmitted to sound by a loudspeaker. The echo cancellation signal is an echo cancellation microphone signal constructed by subtracting an acoustic echo estimation signal based on the speaker signal fi from the microphone signal picked up by one of the communication unit's microphones at the same time as the speaker signal is presented by the speaker.
In the latter scenario, hereinafter referred to as the line echo scenario, the first input line-out signal is transmitted from a first communication unit to a network to which the unit is connected, normally for further transmission to a second communication unit. The echo extinguished output signal is an echo extinguished line-in signal which is constructed by subtracting a line echo estimation signal based on said line-out signal from a line-in signal to be transmitted to the first communication unit.
In the following, the invention and its advantages will be mainly described in the context of an acoustic echo cancellation scenario. However, it should be understood that the same principles apply mutatis mutandis in the line echo scenario, unless otherwise stated.
By controlling the attenuation of the echo-extinguished microphone signal based on a signal which is calculated with the aid of the echo-estimation signals originating from both the foreground and the background filter, several advantages are achieved. Since the background filter is configured to adjust faster than the background filter, which means that the background filter coefficients converge faster to new values after a change in the filter input signal, the echo estimation signal is a result of the background filter. the background filter is more reliable in other situations. In general, the background filter generates better echo estimates than the background filter in an eco-path change situation, while the background filter is more reliable, the background filter in a double digit situation. By controlling the attenuation based on a signal calculated by the two, the advantages of the different fi ltres can be utilized and used for the attenuation control.
According to a preferred embodiment of the invention, the maximum echo estimation signal is calculated based on the power or energy spectral density of the first and second echo estimation signals. This can be accomplished by, for example, aligning the first and second echo estimation signals and, for a given frequency or frequency range, calculating a peak amplitude value for the maximum echo estimation signal based on the power or energy of the rectified echo estimation signal having the greatest power or energy of the thirst and the second rectified echo estimation signal at approximate frequency or within said fi sequence range.
According to one aspect of the invention, the maximum echo estimation signal is calculated to correspond to the echo estimation signal having the highest spectral density over the total frequency range of the output signals from the background and background filters.
This results in a maximum echo estimation signal corresponding to the echo estimation signal with the largest total energy of the first and second echo estimation signals. According to another aspect, the energies in the first and second echo estimation signals and the maximum echo estimation signal are calculated on a subband basis. For example, the spectral density of the first and second echo estimation signals can be calculated on a subband basis, and, for each subband, the maximum echo estimation signal for a specific subband can be calculated to correspond to the echo estimation signal having the highest spectral density within the particular subband. The ratio of speech to echo in the microphone signal can be estimated for the full-band microphone signal based on a single llll-band maximum echo estimation igral, or by individual estimates for each plurality of subbands of the microphone signal based on a subband maximum. echo estimation signal calculated for the corresponding subband.
Likewise, residual code attenuation can be performed by applying a single attenuation signal to the echo-extinguished microphone signal, or by applying a number of subband attenuation signals to the echo-extinguished microphone signal. These subband attenuation signals can be calculated from the individually estimated speech-echo ratios in the corresponding subband or from a combination of adjacent subbands.
As a result, attenuation can be applied individually to each subband.
By making the subbands narrower, the amplitude of the maximum echo estimation signal for a given frequency can be made to correspond to the largest amplitude of the first and second rectified echo estimation signals for the given frequency.
Performing the treatment in subbands utilizes the spectral characteristics of speech, that is, the energy is concentrated on certain subbands, and can further improve the structure compared to the fullband solution.
By calculating a maximum echo estimation signal based on maximum power or energy in the first and second echo estimation signals according to any of the principles described above, a maximum echo estimation signal is generated which describes a worst case eco-scenario. The maximum echo estimation signal thus avoids underestimating the real echo signal.
At a given time, the foreground or background filter may be better configured than the other to estimate the echo at a certain frequency, or within a certain frequency range, while the second filter in turn is better configured to estimate the echo at another frequency, or within another frequency range. By calculating the maximum echo estimation signal on a subband for subband basis, or even on a frequency basis frequency, the maximum echo estimation signal 10 15 20 25 30 will represent a worst case echo scenario at any given frequency or frequency range.
In a communication system, it is important not to attenuate a signal that carries speech so as not to impair the user benefit of the system. However, it is generally considered more acceptable to apply attenuation to a speech-carrying signal for a short period of time than not to apply attenuation to a signal that carries nothing but echo. Loss of speech or decreased speech volume over a period of a few milliseconds is usually experienced as less annoying to a user than the sound of echo during the same time period, as this sound is often perceived as an annoying sound. By calculating the maximum echo estimation signal in the way proposed above, the risk of not attenuating a microphone signal that carries no other than echo is minimized because the attenuation is controlled based on a “worst possible echo scenario”.
In an echo path change situation caused by, for example, the movement of people or objects in a room, the background filter will usually quickly adapt to the new acoustic environment and deliver a good estimate of the echo picked up by the microphone. For example, in an eco-path change situation where the actual eco-energy increases, the adaptive background filter will adapt to the new eco-path and generate a correspondingly increased eco-energy estimate, while the fixed foreground filter will continue to generate low eco-energy estimates in accordance with the previous eco-path. In a dual situation on the other hand, the adaptive background filter will diverge and extinguish near-side numbers, which may result in an underestimation of the eco-energy, while the fixed foreground filter will generate a much more accurate estimate. By using the maximum energy in the echo estimation signals, you will sometimes generate an overestimation of the echo, but never an underestimation. Overestimating the echo will result in more echo cancellation than necessary. However, it is considered acceptable to apply stronger attenuation than necessary to the echo canceller microphone signal in this case, especially since the increase in attenuation is caused by an echo path change situation and not a dual number situation. Furthermore, controlling the attenuation of the echo canceller microphone signal based on a signal calculated as the maximum of the first and second echo estimation signals as described above is advantageous in situations where the rapidly updating background filter needs to be reset, which may occur from time to time. . After resetting the background filter, it will take some time for the filter coefficients to reach their correct values and in the meantime, the background filter will generate too low an echo estimate. However, the maximum echo estimation signal in this situation will correspond to the echo estimation signal from foreground filter and thus maintain an acceptable level.
Suitably, the attenuation control means is configured to control the attenuation of the echo extinguished output signal based on the maximum echo estimation signal and a second input signal. In the acoustic echo scenario, the second input signal is the microphone signal and in the line echo scenario, the second input signal is the line-in signal. This can be accomplished by, for example, controlling the attenuation depending on a signal obtained by subtracting the energy of the maximum echo estimation signal from the energy of the second input signal, or depending on a signal obtained by the ratio of the energy of the second input signal and the energy of the maximum echo estimation signal. If the ratio of the energy of the second input signal to the energy of the maximum echo estimation signal is close to one, then the second input signal probably contains no arm other than echo (acoustic echo in the acoustic echo scenario and line echo in Iinjeeko scenario), whereupon the attenuator can supply the echo extinguished output signal. strong damping. If, on the other hand, the ratio is much larger than one, the second input signal probably contains speech, whereupon the attenuation of the echo-extinguished signal is limited.
According to a refined embodiment of the invention, the attenuator control means is configured to calculate a minimum residual echo estimation signal by subtracting the energy of the maximum echo estimation signal from the energy of the second input signal, and to calculate a maximum residual echo estimation signal based on the first input signal and a switching factor. echo extinguished output signal based on a comparison between the minimum residual echo estimation signal and the maximum residual echo estimation signal. In the acoustic echo scenario, the coupling factor 10 is usually referred to as acoustic coupling and is a measure of the dependence between speaker signal and microphone signal. In the line echo scenario, the coupling factor is a measure of the dependence between line-out-sigrial and line-in-sigrial. If the ratio between the minimum and maximum residual echo estimation signal is high, the probability that the other input signal contains numbers is high. If, on the other hand, the ratio is low, it is probable that the second input signal does not contain anything other than echo and the echo extinguished output signal can thus be strongly attenuated.
The calculations described above can be performed on a subband basis, which means that some or all of the calculations can be performed for one or more subbands of the processed signals individually. Attenuation of the echo-extinguished output signal, so-called residual echo attenuation, or residual echo attenuation, can then be controlled based on results of the calculations performed for one or more of your subbands. Attenuation can be applied, for example, to different subbands or groups of subbands individually by applying different attenuation signals for different subbands or groups of subbands.
More advantageous embodiments of the unit, the method and the computer program according to the invention are described in the detailed description of the invention which follows. General Description of Figures A more complete understanding of the invention described herein will be obtained when the same is better understood by reference to the following detailed description when considered in conjunction with the accompanying figures which are briefly explained below.
F ig. 1 is a schematic block diagram illustrating a conference telephone; Fig. 2 is a block diagram illustrating an acoustic echo cancellation procedure performed by a digital signal processor (DSP) in the conference telephone shown in Fig. 1; Fig. 3 is a block diagram showing the estimation procedure of the ratio of speech to echo performed by the DSP shown in Fig. 2; Figs. 4A-B illustrate an exemplary method of calculating a maximum echo estimation signal according to an embodiment of the invention; Figs. SA-B and 6A-B illustrate another exemplary method of calculating a maximum echo estimation signal according to an embodiment of the invention, and Fig. 7 shows a conference telephone containing a computer program according to the invention.
Detailed Description of the Invention In the following description of the invention, reference will be made to a loudspeaker conference telephone. It should be noted, however, that a loudspeaker conference telephone is only an example of a communication unit to which the present invention is applicable. Other examples of communication devices for which the principles of the invention could be useful are, for example, integrated car telephones and conventional mobile telephones set in loudspeaker mode.
Fig. 1 shows a block diagram of a conference telephone 1. An analog line-in signal transmitted speech from the B-side, together with line echo, is received from the telephone network through an input of the conference telephone (not shown). The analog line-in signal is converted into a discrete signal, 1 (k), by an analog-to-digital (A / D) converter 3.
Then the signal 1 (k) is fed to a digital signal processor (DSP) which processes the signal in an attempt to reduce line echo and retain speech from the B-side, and generates an output signal in the form of a speaker signal xfk). The speaker signal x (k) is converted into an analog signal by a digital-to-analog (D / A) converter 7, amplified by an amplifier 9, and fed to a speaker 11. The loudspeaker 11 consequently presents processed B-numbers for the A-side. Proximity speech, S, and acoustic echo, a, from the A side are picked up by a microphone 13, amplified by an amplifier 15, and converted by an A / D converter into a digital microphone signal, y (k), which is input signal to the DSP 5. The DSP processes the microphone signal y (k) in an attempt to reduce the acoustic echo a by preserving speech, S, from the A-side, resulting in an output signal o ( k). The output signal o (k) is then converted to an analog signal by means of a D / A converter 19 and sent to the telephone network.
Fig. 2 shows a block diagram illustrating the acoustic echo cancellation procedure performed by the DSP in Fig. 1. The signal 1 (k) from the B-side constitutes, after any processing performed by signal processor 21, the speaker signal x (k) sent to the speaker ( ll in Fig. 1) and to two adaptive echo cancellers 23, 25 in the DSP 5. Filter 23 is a foreground filter configured to produce a first echo estimation signal, á / k), based on the received speaker signal x (k). Filter 25 is a background filter configured to produce a second echo estimation signal, âg, (k), based on the received speaker signal x (k). The background filter 25 is updated faster than the foreground filter 23. The output signal, â / (k), from the foreground filter 23 and the output signal, á;, (k), from the background filter are subtracted from the microphone signal y fl c) by means of two subtractors 27, 29 and form a feedback signal. , ef (k), and a feedback signal for the background filter, e¿, (k). The background filter feedback signal, e¿, (k), is used to update the background filter 25 and the foreground filter feedback signal, not fl c), can be used to update the foreground filter 23. However, the foreground filter 23 does not need to be implemented as a self-updating filter. .
Instead, according to another embodiment of the invention, the foreground filter 23 is a fixed filter, in the sense that it is not configured to update itself. The DSP 5 may include filter updater (not shown) configured to update foreground filter 23 by copying background filter 25 to foreground filter 23 when a certain criterion is met. Usually such a criterion is chosen so that the background filter 25 is copied to the foreground filter 23 when the background filter 25 is considered to perform better in terms of echo cancellation than the foreground filter 23.
The DSP 5 further includes an estimator of speech-to-echo ratio 31 which is configured to calculate an estimate of the ratio of tenside number to echo, r (k). To achieve this, the estimator of the speech-to-echo ratio 31 10 15 20 25 30 14 uses the speaker signal x (k), the microphone signal y fl c), the first echo estimation signal â / k) from the foreground filter 23 and the second echo rating signal âb fl c) from the background filter 25. The estimation signal of the ratio of near-side number to echo, r (k), from the estimator of speech-to-echo ratio 31 is sent to a gain signal 33 which is configured to produce a gain / attenuation signal g (k), based on the estimation signal of the ratio between near side numbers and echo, r (k). The amplification / expansion signal, g (k), is in turn sent to a residual echo processing unit 35 which also receives the echo-extinguished microphone signal, ef (k), which corresponds to the feedback signal of the foreground filter.
The residual co-processing unit 35 is configured to determine the attenuation of the foreground filter feedback signal, e fl c), based on the received gain / attenuation signal. The gain calculator 33 may be configured to calculate gain / attenuation according to any known principle to determine gain / attenuation for an echo off microphone signal. For example, a simple pre-gain calculator 33 may be configured to calculate gain / attenuation as follows: gÜfF / lfg fl f) + (1-Å) * gc (k) (1) where gc (k) = 1 if r (k)> T , and gJkFÛ About 1479511 where T is a definite threshold value and Å = 7t1 if gc (k)> g (k) and Å = 7t2 about gc fl díg fl c), where X1 and X2 are equalization factors that determine the rise and fall time of the gain / attenuation equalization The residual co-processing unit 35 then supplies the gain / attenuation signal g (k) to the feedback signal of the foreground filter. e fi c), to generate the output signal o (k).
Thus, in this exemplary embodiment, the echo canceller microphone signal to which attenuation is applied corresponds to the foreground filter feedback signal, e / (k), which is obtained by subtracting the first echo estimation signal, áJ / (k), output from the foreground filter 30 from the microphone 30. 15 y (k). It should be noted, however, that the invention is not limited to the use of a special echo canceller microphone signal. For example, the invention is equally applicable if the analog signal sent to the network (see Fig. 1) is based on the background filter feedback signal, e;, (k), instead of the foreground filter feedback signal, ef (k), or an echo canceller microphone signal created by subtraction of a combination of the first and second echo estimation signals from the microphone signal. That is, the gain is not included in the nature of the echo cancellation but in how the attenuation of any echo cancellation signal used as an output signal by the communication unit 1 is controlled.
Fig. 3 shows a more detailed block diagram of the processing performed by the estimator of the speech-to-echo attitude 31, which is shown in Figs. 2.
In steps S1 and S2, the first and second echo estimation signals, á / (k) and áb fl c), are rectified and filtered to form a first rectified and filtered echo estimation signal, âf fl fk), and a second rectified and fi filtered echo estimation signal, â;, _ fi; , (k). This can be achieved by, for example, moving averages or exponential recursive weighting.
In step S3, the maximum of áf fi hdc) and âbj / (lc) is calculated, which forms a maximum echo estimation signal ámax fl c).
In step S4, the maximum echo estimation signal ámaA- (k) is filtered using, for example, moving average or exponential recursive weighting, which results in a filtered maximum echo estimation signal ámaxä (gfk).
In parallel with steps S1 to S4, the microphone signal, y (k), is rectified and filtered in a step called S5, by, for example, moving average or exponential recursive weighting, which forms a rectified and filtered microphone signal y fififl c).
In step S6, the filtered maximum echo estimation signal, âmax _fi1, (k), is subtracted from the rectified and filtered microphone signal, y fi ffk), which forms a minimum residual echo estimation signal, emm fl t). This signal, em, -n (k), can be seen as an equalized mean of combined near-side number and residual echo.
In parallel with steps S1 to S6, the speaker signal, x (k), is rectified and filtered in a step called S7, by, for example, moving average value or exponential recursive weighting, which forms a rectified and filtered speaker signal. x fi ;, (k). In a parallel step S8, an acoustic switching factor, c (k), is estimated based on the speaker signal, x (k), and the microphone signal, y fl c). In a step S9, the rectified and filtered speaker signal x fi¿, (k), generated in step S7, is multiplied by the acoustic coupling factor, c (k), to create a signal, ec fl c), which can be seen as a "worst case". -estimation ”of the acoustic echo. In a step S10, a noise estimation signal, n (k), is created based on the microphone signal, y (k). In a step S11, a maximum residual echo estimation signal, emax fl c), is calculated as the maximum of the signal, ec (k), created in step S9 and the noise estimation signal, n (k). which is estimated in step S10. Finally, in a step S12, a signal, r (k), is created, which describes the relationship between near-side number and echo, by dividing the minimum residual echo estimation signal, emm fl c), created in step S6, by the maximum residual echo estimation signal, emax fl c), calculated in step S11. The signal, Nk), which describes the relationship between near-side speech and echo, is then sent to the gain monitor 33, shown in Fig. 2, and is then used to control the attenuation of the echo-extinguished microphone signal e fi c).
Accordingly, according to the Proposed Method, the attenuation of the echo-extinguished microphone signal, ef fl c), is based on the signal, r (k), which describes the ratio of near-side number to echo, which in turn is based on the maximum residual echo estimation signal, emax (k).
Steps S1 to S12 can be performed in fl your different ways. As mentioned earlier, the signals can be divided into different frequency bands and the processing can be performed in individual frequency bands, or the processed signals can be full band signals or a set of subband signals, where a specially processed signal can be processed using one, several or all input signals. Figs. 4A and 4B show a typical manner in which the maximum echo estimation signal, âmax (k), can be calculated based on the first and second echo estimation signals, âf fl c) and â;, (k). The graphs in Fig. 4A show examples of a first and second echo estimation signal after being aligned and filtered in steps S1 and S2 in Fig. 3. The graphs show the effect of the rectified echo estimation signals, áf fl Jk) and ábjh fl c), as a function of frequency for it. relevant frequency range and consequently illustrates the spectral density of each signal. The relevant frequency range is typically the frequency range where speech can occur. In this embodiment, the estimator of the etal-to-echo ratio 31 (see Fig. 2) is configured to calculate the integral (sum) of the respective rectified echo estimation signal, âf _fi1, (k) and ábjh fl c), over the entire relevant frequency range, i.e. say the area in the xy plane is limited by the respective signal. the integral (sum) of the rectified first echo estimation signal, âfßy fl c), from the foreground filter 23 is denoted A p and the integral (sum) of the rectified second echo estimation signal, â¿, _ fi1, (k), from the background filter 25 years denoted A B. The integrals ( the sums) of the first and the second rectified echo estimation signal indicate their respective energy content. In this typical case, the integral (sum), AE, of the rectified second echo estimation signal, âbjh fl c), is greater than the integral (sum), A p, of the rectified first echo estimation signal, âf _fi1, (k), indicating that the second echo estimation signal , â;, (k), from the background filter 25, contains more energy than the first echo estimation signal, â fi c), from the foreground filter 23. The estimator of the speech-to-echo ratio 31 is configured to compare the integrals (sums), A p and AB , that is, the energy in the first and second echo estimation signals, â / (k / and á¿, (k), and setting the maximum echo estimation signal, á ,,, a, (k), equal to the signal of the rectified echo estimation signals, áUMk) or ábj fi k), which contains the most energy. Fig. 4B shows the maximum echo estimation signal. âmJk), calculated from either the first and the second rectified echo estimation signal, âf fl t fl c) or â;, _ ;, y, (k), which are shown in Fig. 4A, according to the principles described above. In this case, the maximum echo estimation signal, âmax fl c), thus corresponds to the rectified second echo estimation signal open fl f flfi) Figs. 5A and 5B show another exemplary way in which the maximum echo estimation signal, âmax fl c), can be calculated based on the first and the second. The echo estimation signal, â / k) and â;, (k). The graphs in Fig. 5A show the rectified first and second echo estimation signal, á / k) and âb fl c), and Fig. SB shows the maximum echo estimation signal, âmax fl c), calculated from the rectified first and the rectified second echo estimation signal, áfjk fl c) or ábjg fl c), which are shown in Fig. 5A, when the calculations are performed on a subband basis. In this embodiment, the estimator of the speech-to-echo ratio 31 is configured to calculate the integral (sum) of each of the first and second rectified echo estimation signals within a certain frequency range or subband, hereinafter referred to as the F R. Maxirnum echo estimation signal, âmaxßïc). , is calculated on a subband basis by, for each subband, calculating the maximum echo estimation signal, ámax fl c), so that it corresponds to the rectified echo estimation signal, âf fifi c) or ábjh fl c), which has the largest integral (sum) within the given subband. For example, in the subband between the sequences fi, and 12+ 1, within which the integral (sum) A p; of the rectified first echo estimation signal âf_ fi ,, (k) is greater than the integral (sum) A B; of the rectified second echo estimation signal âbj fl k), the maximum echo estimation signal, ámfk), is set to correspond to the rectified first echo estimation signal âfjy fl c). In all other subbands, given this exemplary bandwidth, the integral (sum) of the rectified second echo estimation signal, âbjg fl c), is greater than the integral (sum) of the rectified first echo estimation signal, áf _ ;; 1, (k), and therefore the maximum ~ the echo estimation signal, ámax fl c), for this frequency range corresponds to the rectified second echo estimation signal, ábg fi f fl c). It should be noted that the only difference between this calculation procedure of the maximum echo estimation signal, âmax fl c), and the calculation procedure of the maximum echo estimation signal, áMn- (lc), according to Figs. 4A and 4B, is that the frequency range, F R, in the latter case can be seen to correspond to the entire relevant frequency range. Thus, Figs. 4A and 4B illustrate an embodiment according to which the attenuation is controlled on a full band basis and Figs. 5A and 5B illustrate an embodiment where the attenuation is controlled on a subband basis.
F ig. 6A and 6B show what happens to the subband width, that is, the width of the subbands for which the maximum echo estimation signal, áWUJk). calculated as above, goes towards zero. If the estimator of the speech-to-echo ratio 31 is configured to calculate the integrals (sums) over very narrow subbands, the power in the maximum echo estimation signal, âmax fl c), for each given frequency will correspond to the power of the rectified echo estimation signal. , âfjh fl c) or âbjlfk), which has the greatest effect at this particular frequency. In this case, the maximum echo estimation signal, ámagk), will indeed be indicative as a measure of the maximum energy content of the first and second echo estimation signals â; (k) and á¿, (k).
It is to be understood that the methods described above for creating the maximum echo estimation signal, âmagk), from the maximum of the rectified echo estimation signals, áfßJk) and ábjh fl c), are only examples. One skilled in the art will recognize that there are other ways in which the estimator of speech-to-echo ratio 31 can be configured to produce similar results. For example, the maximum echo estimation signal, á ,,, a _, (k), can be calculated by comparing the power of the first and second rectified echo estimation signals, âfjy y c) and âbjf f lc), at a number of discrete frequencies, setting the effect of the maximum echo estimation signal, ámax (k), at a given frequency to the one of the rectified echo estimation signals, áfjh fl c) or âbjh fl c), which has the greatest energy at this frequency, and then interpolate between the determined power / frequency values of the maximum echo estimation signal, fii ,,,, , x (k).
One skilled in the art will also appreciate that the above reasoning is equally applicable to the treatment in steps S1, S2 and S4 to S12.
The steps for calculating the maximum echo estimation signal, âmax fl c), and for controlling the attenuation of the echo-extinguished microphone signal, ef fl c), based on the maximum echo estimation signal, âmax fl c), are preferably performed using a computer program.
Fig. 7 shows a conference telephone 1 comprising a loudspeaker 11 and a microphone 13.
The conference telephone further consists of a processing unit 37 which may, but need not be, the DSPzn in Fig. 1, and a computer readable medium 39, such as a hard disk or other non-usable memory for storing digital information. The computer readable medium 39 is seen storing a computer program 41 consisting of computer readable code which, when executed by the processing unit 37, causes the DSP 5 to control the attenuation of the echo extinguished microphone signal e / (k) according to the principles described herein. It should be noted that the functionality for controlling the attenuation of acoustic residual echo, as described above, can be inserted in a separate attenuation unit located in the telephone network instead of in the conference telephone itself. In this case, the acoustic echo cancellation procedure shown in FIG. 2 of the network attenuation unit configured to forward the speaker signal, x (k), to the communication unit for which it is intended, and to receive the microphone signal, y (k), therefrom.
In addition, as mentioned in the introductory part, it should also be noted that the invention can be used to control attenuation of line residual echoes, i.e. echoes coming from the telephone network. The description of such an embodiment is similar to the description above which explains an acoustic eco-scenario. The line echo scenario can be very easily understood by replacing the speaker signal, x (k), with the line-out signal, o (k), the microphone signal, y (k), with the line-in signal l in Figs. (k), the line-out signal o (k) to the speaker signal x (k) and the line-in signal l (k) to the microphone signal y (k).
权利要求:
Claims (20)
[1]
A unit (1) comprising: - an adaptive foreground filter (23) configured to calculate a first echo estimation signal [âf (k)] based on a first input signal [x (k); o (k)], - an adaptive background filter (25) which is updated faster than said foreground filter (23) and is configurable to calculate a second echo estimation signal [âb (k)] based on said first insigrral [x (k); o (k)], and - attenuation control element (31, 33, 35) for controlling attenuation of an echo-extinguished output signal leákll, characterized in that said attenuation control element (31) is configured to calculate a maximum echo estimation signal [âmax (k)] using both said first [af (k)] and said second [âb (k)] echo estimation signal, and controlling the attenuation of the echo extinguished output signal [ef (k)] based on said maximum echo estimation signal [âmax (k)] and / or a derived signal from said maximum echo estimation signal [âmaXUOl-
[2]
Unit (1) according to claim 1, wherein said attenuation control means (31) is configured to calculate said maximum echo estimation signal [âmu (k)] based on the power or energy spectral density of said first [âf (k)] and second [ âb (k)] echo estimation signal.
[3]
Unit (1) according to claim 2, wherein said attenuation control element (31 l is configured to calculate said maximum echo estimation signal [âmax (k)] by, for a given frequency or given frequency range (FR), calculating an amplitude for maximum the echo estimation signal [âmax (k)] based on the power or energy of the echo estimation signal which has the greatest power or energy of said first [âf (k)] and second [â1, (k)] echo estimation signals at said frequency or within said frequency range (FR) 10 15 20 25 30 22
[4]
A unit (1) according to any one of the preceding claims, wherein said attenuation control element (31) is configured to control the attenuation of said echo extinguished output signal [ef (k)] based on said maximum echo estimation signal [âmax (k)] and a second input signal [ y (k); l (k)].
[5]
The unit (1) according to claim 4, wherein said attenuation control element (31) is further configured to calculate a minimum residual echo estimation signal [emin (k)] by subtracting said maximum echo estimation signal [âmax (k)] from said second input signal [y (k); l (k)] and control the attenuation of said echo extinguished output signal [ef (k)] based on said minimum residual echo estimation signal [emin (k)].
[6]
The unit (1) according to claim 5, wherein said attenuation control means (31) is further configured to calculate a maximum residual echo estimation signal [ema, (k)] based on said first input signal [x (k); o (k)] and a switching factor [c (k)], and to control the attenuation of the echo extinguished output signal [ef (k)] based on a comparison between said minimum residual echo estimation signal [emin fl 0] and said maximum residual echo estimation signal [emm (k) )].
[7]
Unit (1) according to claim 6, wherein said attenuation control means (31) is configured to calculate said maximum residual echo estimation signal [emax (k)] as a combination of: - a residual echo estimation signal [ec (k)], which in turn is calculated based on said first input signal [x (k); o (k)] and said coupling factor [c (k)], and - a noise estimation signal [n (k)] which in turn is calculated based on said second input signal [y (k); l (k)].
[8]
Unit (1) according to any one of the preceding claims, wherein said unit (1) is configured to perform all or some of the calculations for given frequency subbands in the processed signals, so that the residual code attenuation [g (k)] can be performed in frequency subbands or in full band based on some or all of the frequency bands used.
[9]
Unit (1) according to any one of the preceding claims, further comprising filter updating means configured to update the dry filter (23) by copying the background filter (25) to the foreground filter (23) when a certain criterion is met.
[10]
A unit (1) according to any one of the preceding claims, wherein said unit is a communication unit, such as a conference telephone or an integrated car telephone, and contains a speaker (1 1) for converting the first input signal [x (k); o (k)] to sound when said first input signal is a speaker signal [x (k)], and a microphone (13) for converting sound to a second input signal [y (k); l (k)} in the form of a microphone signal [y (1 <)] -
[11]
11. 1 l. A method of controlling attenuation of an echo-extinguished output signal [ef (k)], comprising the steps of: - calculating, by means of a foreground filter (23), a first echo-estimation signal [âf (k)] based on a first insignia [x (k); o (k)], - calculate, by means of a background terlter (25) which is updated faster than said foreground filter (23), a second eco-tax signal [âb (k)] based on said first insignia [x (k); o (k)], characterized by the steps of: - calculating (S3) an rnaximum echo estimation signal [âmax (k)] using both said first [âf (k)] and said second [âb (k)] echo estimation signal and - controlling the attenuation of said echo extinguished output signal [e; (k)] based on said maximum echo estimation signal [âmax (k)] and / or a signal derived from said maximum echo estimation signal [âmax (k)].
[12]
The method of claim 11, wherein the calculation of the maximum echo estimation signal [âmax (k)] is performed by calculating the maximum echo estimation signal [âmax (k)] based on the power or energy spectral density of said first [âk k)] and other [âb (k)] echo estimation signal.
[13]
A method according to claim 12, wherein the calculation step of the maximum echo estimation signal [âmax (k)] is performed by, for a given frequency or given frequency range (FR), calculating an amplitude of the maximum echo estimation signal [âmax (k)] based on the power 10 15 20 25 30 24 or the energy in the echo estimation signal which has the greatest power or energy of said first [âf (k)] and second [âb (k)] echo estimation signal at said frequency or within said frequency range (FR).
[14]
A method according to any one of claims 11 to 13, wherein the step of controlling the attenuation of the echo extinguished output signal [ef (k)] is performed based on said maximum echo estimation signal [âmax (k)] and a second input signal [y (k); 1 (k)].
[15]
The method of claim 14, wherein the step of controlling the attenuation of the echo extinguished output signal [ef (k)] is preceded by a step in which a niimimurn residual echo estimation signal [emm (k)] is calculated by subtracting said maximum echo estimation signal [âmx (k) )] from said second input signal [y (k); 1 (k)] and wherein the step of controlling the attenuation of the echo extinguished output signal [ef (k)] is performed based on said minimum residual echo estimation signal [emi ,, (k)].
[16]
A method according to claim 15, wherein the step of controlling the attenuation of the echo extinguished output signal [ef (k)] is preceded by a step in which the maximum residual echo estimation signal [emax (k)] is calculated based on said first input signal [x (k); o (k)] and a switching factor [c (k)], and wherein the step of controlling the attenuation of the echo extinguished output signal [ef (k)] is performed based on a comparison between said minimum residual echo estimation signal [e1nin (k)] and said maximum -restecost estimate signal [emax (k)].
[17]
A method according to claim 16, wherein said maximum residual echo estimation signal [emax (k)] is calculated as a combination of: - a residual echo estimation signal [ec (k)], which in turn is calculated based on said first input signal [x (k)]; o (k)] and said coupling factor [c (k)], and ~ a noise estimation signal [n (k)] which in turn is calculated based on said second input signal [y (k); 1 (k)].
[18]
A method according to any one of claims 11 to 17, wherein all or some calculations are performed for given frequency subbands in the processed signals, so that the residual encoder [g (k)] can be performed in frequency subbands or in fullband based on some or all used frequency bands.
[19]
A computer program (41) for a unit (1) according to any one of claims 1 to 10, characterized in that said computer program (41) consists of computer-readable code which when run by a processing unit (37) in the unit (1) receives the unit (1) performing the method according to any one of claims 11 to 18.
[20]
A computer program product consisting of a computer readable medium (39) and computer readable code stored on the computer readable medium (39), characterized in that the computer readable code is the computer program (41) according to claim 19.
类似技术:
公开号 | 公开日 | 专利标题
SE0901012A1|2011-01-21|Device and method for controlling residual cushioning
EP3375180B1|2020-01-15|Double-talk detection for acoustic echo cancellation
KR102124761B1|2020-06-19|Downlink tone detection and adaption of a secondary path response model in an adaptive noise canceling system
KR102031023B1|2019-10-14|Sequenced adaptation of anti-noise generator response and secondary path response in an adaptive noise canceling system
KR100524341B1|2005-10-28|Acoustic echo canceler
KR100519001B1|2005-10-06|Methods and apparatus for controlling echo suppression in communications systems
US7974428B2|2011-07-05|Hearing aid with acoustic feedback suppression
CN103748865B|2015-08-19|Utilize the clock deskew of the acoustic echo arrester of not audible tone
GB2525051A|2015-10-14|Detection of acoustic echo cancellation
KR20100041741A|2010-04-22|System and method for adaptive intelligent noise suppression
CN105577961B|2020-06-05|Automatic tuning of gain controller
KR20010033994A|2001-04-25|Methods and apparatus for providing comfort noise in communications systems
WO2009117084A2|2009-09-24|System and method for envelope-based acoustic echo cancellation
GB2532348A|2016-05-18|Controlling operational characteristics of acoustic echo canceller
US20120295562A1|2012-11-22|Processing Audio Signals
US20070121926A1|2007-05-31|Double-talk detector for an acoustic echo canceller
WO2012099518A1|2012-07-26|Method and device for microphone selection
CN106448691A|2017-02-22|Speech enhancement method used for loudspeaking communication system
CN102625213B|2014-04-30|Audio system squeaking processing method and audio system
CN101292508A|2008-10-22|Acoustic echo canceller
CN106297816B|2019-12-13|Echo cancellation nonlinear processing method and device and electronic equipment
JP6019098B2|2016-11-02|Feedback suppression
JP2009021859A|2009-01-29|Talk state judging apparatus and echo canceler with the talk state judging apparatus
CN110971769A|2020-04-07|Call signal processing method and device, electronic equipment and storage medium
JP2007053511A|2007-03-01|Speech processing device and microphone apparatus
同族专利:
公开号 | 公开日
US8693678B2|2014-04-08|
SE533956C2|2011-03-15|
WO2011010960A1|2011-01-27|
US20120183133A1|2012-07-19|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US5663955A|1995-08-25|1997-09-02|Lucent Technologies Inc.|Echo canceller system with shared coefficient memory|
FI104524B|1997-04-18|2000-02-15|Nokia Mobile Phones Ltd|An echo cancellation system and method and a mobile station|
US6532289B1|1997-11-28|2003-03-11|International Business Machines Corporation|Method and device for echo suppression|
US6859531B1|2000-09-15|2005-02-22|Intel Corporation|Residual echo estimation for echo cancellation|
JP3608525B2|2001-05-09|2005-01-12|ヤマハ株式会社|Impulse response setting method for 2-channel echo cancellation filter, 2-channel echo canceller, and bidirectional 2-channel audio transmission apparatus|
US6904146B2|2002-05-03|2005-06-07|Acoustic Technology, Inc.|Full duplex echo cancelling circuit|
US6925176B2|2003-06-27|2005-08-02|Nokia Corporation|Method for enhancing the acoustic echo cancellation system using residual echo filter|
WO2006040734A1|2004-10-13|2006-04-20|Koninklijke Philips Electronics N.V.|Echo cancellation|
US7545926B2|2006-05-04|2009-06-09|Sony Computer Entertainment Inc.|Echo and noise cancellation|
US8139760B2|2007-02-27|2012-03-20|Freescale Semiconductor, Inc.|Estimating delay of an echo path in a communication system|JP5156043B2|2010-03-26|2013-03-06|株式会社東芝|Voice discrimination device|
WO2012046582A1|2010-10-08|2012-04-12|日本電気株式会社|Signal processing device, signal processing method, and signal processing program|
CN104487382B|2012-01-27|2018-02-13|英派尔科技开发有限公司|Acceleration through graphene film conveys|
US9473865B2|2012-03-01|2016-10-18|Conexant Systems, Inc.|Integrated motion detection using changes in acoustic echo path|
CN104050971A|2013-03-15|2014-09-17|杜比实验室特许公司|Acoustic echo mitigating apparatus and method, audio processing apparatus, and voice communication terminal|
DE112013007077T5|2013-05-14|2016-02-11|Mitsubishi Electric Corporation|Echo cancellation device|
US9313012B2|2014-02-21|2016-04-12|Qualcomm Incorporated|Apparatus and methods for full duplex communication|
EP3791565A4|2018-05-09|2021-12-29|Nureva Inc.|Method, apparatus, and computer-readable media utilizing residual echo estimate information to derive secondary echo reduction parameters|
CN110956975A|2019-12-06|2020-04-03|展讯通信(上海)有限公司|Echo cancellation method and device|
法律状态:
优先权:
申请号 | 申请日 | 专利标题
SE0901012A|SE533956C2|2009-07-20|2009-07-20|Device and method for controlling residual cushioning|SE0901012A| SE533956C2|2009-07-20|2009-07-20|Device and method for controlling residual cushioning|
US13/384,554| US8693678B2|2009-07-20|2010-06-17|Device and method for controlling damping of residual echo|
PCT/SE2010/050676| WO2011010960A1|2009-07-20|2010-06-17|Device and method for controlling damping of residual echo|
[返回顶部]